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Abstract — We propose an algorithm to maximize the instan- 
taneous sum data rate transmitted by a base station in the 
downlink of a multiuser multiple-input, multiple-output system. 
The transmitter and the receivers may each be equipped with 
multiple antennas and each user may receive more than one data 
stream. We show that maximizing the sum rate is closely linked 
to minimizing the product of mean squared errors (PMSE). 
The algorithm employs an uplink/downlink duality to iteratively 
design transmit-receive linear precoders, decoders, and power 
allocations that minimize the PMSE for all data streams under 
a sum power constraint. Numerical simulations illustrate the 
effectiveness of the algorithm and support the use of the PMSE 
criterion in maximizing the overall instantaneous data rate. 

I. Introduction 

Multiple-input multiple-output (MIMO) systems continue to 
be an important theme in wireless communications research. 
MIMO technology improves reliability and/or increases the 
data rate of wireless transmission. These performance im- 
provements are achieved by exploiting the spatial dimension 
using an antenna array at the transmitter and/or at the receiver. 
A relatively recent theme has been MIMO systems enabling 
multiuser communications in the downlink - a single base 
station communicating with multiple users. 

Much of the existing work on multiuser MIMO systems fo- 
cuses on minimizing the sum of mean squared errors (SMSE) 
between the transmitted and received signals under a sum 
power constraint [l]-[5]. A common theme to most of this 
work is the use of an MSE uplink-downlink duality introduced 
in [5]. The work in [6] provides a comprehensive review of the 
available work in this area including an alternative algorithmic 
approach to this problem. With its focus on SMSE, this body 
of work deals exclusively with maximizing reliability at a fixed 
data rate. In particular, when one considers the behaviour of 
the power allocation step in the SMSE solutions, an "inverse 
waterfilling" type of solution may arise. When starting at an 
optimum point for a fixed power allocation where data streams 
have unequal powers, incremental power that is allocated to 
the system will be assigned to the worst of the active data 
streams. This is required under the SMSE criterion, as the 
worst data stream's MSE dominates the average (and thus, the 
sum) MSE. 

This exclusive focus on minimizing error rate appears to 
hold contrary to an important motivation in deploying MIMO 
systems: increasing data rate. The problem of maximizing data 
rate has been studied in depth in information theory, where 



sum capacity is attained by maximizing mutual information. 
In contrast to SMSE minimization, information theoretical 
approaches apply a waterfilling strategy to assign available 
incremental power to the best data stream [7]-[10]. Unfor- 
tunately, the sum-capacity precoding strategy [11] can not 
be realized practically, and even suboptimal approximations 
(e.g. those employing Tomlinson-Harashima precoding [12]) 
require nonlinear precoding, user ordering, and incur ad- 
ditional complexity. Orthogonalization based methods using 
zero-forcing and block diagonalization allow for a simple for- 
mulation of the sum capacity [13], but the resulting constraint 
on the number of receive antennas can severely restrict the 
possibility of receive diversity and/or the associated increase 
in sum capacity. Several papers have looked at the general 
problem of maximizing sum capacity using linear precoding 
for the multiuser downlink with single antenna receivers [14]- 
[16], but only recently has work been performed on the case 
of multiple receive antennas [17]. 

One important connection that we formulate in this paper is 
the relationship between the sum capacity and the product of 
mean squared errors (PMSE). In the single-user multicarrier 
case, minimizing the PMSE is equivalent to minimizing the 
determinant of the MSE matrix and thus is also equivalent to 
maximizing the mutual information [18]. This equivalence can 
also be seen in the relationship developed between minimum 
MSE (MMSE) and mutual information in [19]. The existence 
of these relationships motivates us to consider a PMSE min- 
imizing solution for the multiuser downlink to maximize the 
sum data rate over multiple users, possibly with multiple data 
streams per user, given a maximum allowable transmission 
power and constraints on the error rate of each stream. 

Information theoretical results for achieving sum capacity 
provide an upper bound for achievable performance; however, 
a practical system cannot use Gaussian codebooks in the 
design of its transmit constellations. With this in mind, we 
evaluate the performance of our PMSE minimizing linear 
precoder under adaptive PSK modulation. The resulting al- 
gorithm attempts to maximize the sum data rate, under PSK 
modulation, with a constraint on the bit error rate of each data 
stream. To our knowledge, this form of sum rate maximization 
(as opposed to that performed in a purely information theoretic 
sense) has not been attempted before. 

The remainder of this paper is organized as follows. Sec- 
tion states the assumptions made and describes the system 
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Fig. 1 . Processing for user k in downlink and virtual uplink. 



model used. Section |lll] investigates the motivation for using 
the product of MSEs as an optimization criterion, and Sec- 
tion |IV] proposes an optimization algorithm to minimize the 
PMSE under a sum power constraint. Results of simulations 
testing the efficacy of the proposed approach are presented in 
Section |V] Finally, we draw conclusions in Section |VT] 

II. System Model 

The system under consideration, illustrated in Fig. [T] com- 
prises a base station with M antennas transmitting to K 
decentralized users. User k is equipped with Nk antennas 
and receives Lk data streams from the base station. Thus, 
we have M transmit antennas transmitting a total of i = 
X^fcLi ^fc symbols to K users, who together have a total of 
N = ^^^-^ Nk receive antennas. The data symbols for user k 
are collected in the data vector = [xki, Xk2, ■ ■ ■ , ^kL^]^ 
and the overall data vector is x = [x^, x^, . . . , x]^] . 
We focus here on linear processing at the transmitter and 
receiver. Hence, to ensure resolvability we require L < M 
and Lk < Nk, Vfc. 

User fc's data streams are processed by the M x Lk transmit 
filter XJk — [ufci, . . . . UfcL^] before being transmitted over 
the M antennas. Each Ukj is the precoder for stream j of 
user k, and has unit power (||ufej||2 ~ 1, where || • ||2 is the 
Euclidean norm operator). These individual precoders together 
form the M x L global transmitter precoder matrix U = 
[Ui, U2, . . . , Ua']. Let pkj be the power allocated to stream 
j of user k and the downlink transmit power vector for user k 
be pk = [pki,Pk2,---,PkL^f, with p = [pf,...,p^] . 
Define = diag{pk} and P = diag{p}. The channel 
between the transmitter and user k is assumed flat and is 
represented by the Nk x M matrix H^, where (•)^ indicates 
the conjugate transpose operator The resulting NxM channel 
matrix is H^, with H = [Hi, H2, . . . , Hx] . The transmitter 
is assumed to know the channel perfectly. 

Based on this model, user k receives a length Nk vector 



yfe = HfuVPx + nfe, 



(1) 



where consists of the additive white Gaussian noise 
(AWGN) at the user's receive antennas with power a^; that 



is, E [n/cn^] — cr^IjVj., where E [•] represents the expectation 
operator To estimate its Lk symbols x^, user k processes 
with its Lk X Nk decoder matrix Vj^ resulting in 

xf^=VfHfUVPx + Vfnfc, (2) 

where the superscript indicates the downlink. 

The global receive filter is a block diagonal decoder 
matrix of dimension L x N, V — diag [Vi, V2, • • • , Va'], 
where each Vfc = [v^i, . . . , Vfc^J. 

We make use of the dual virtual uplink, also illustrated 
in Fig. [U with the same channels between users and base 
station. Let the uplink transmit power vector for user k be 
Qfc = [qki,qk2,---,qkLj'^, with q = [qf , . . . , q]^]^, and 
define Qfc = diag{qk} and Q = diag{q}. The transmit and 
receive filters for user k become Vfc and respectively. As 
in the downlink, the precoder for the virtual uplink contains 
columns with unit norm; that is, ||vfcj||2 = 1. The received 
vector at the base station and the estimated symbol vector for 
user k are 
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= ^iii^i + n, 

i=l 
K 

= ^UfH,V,yQ;x, + Ufn. 



(3) 



(4) 



i=i 



The noise term, n, is again AWGN with E [nn-^] = (t'^Im- 

We assume that the modulated data symbols x are drawn 
from a PSK constellation where each data symbol Xi has 
power l^ip = 1. Furthermore, the data symbols are inde- 
pendent so that E [xx^] = 1^. Also, noise and data are 
independent such that E [xjii^] = 0. Finally, we define a 
useful virtual uplink receive covariance matrix as 



J = E [yy«] = HfcVfcQfcVf Hf + gHm 



k=l 



= HVQV^H^ 



M- 



(5) 



III. Product of Mean Squared Errors 

Information theoretical approaches characterize the sum ca- 
pacity of the multiuser MIMO downlink or broadcast channel 
(BC) by solving the sum capacity of the equivalent uplink mul- 
tiple access channel (MAC) and applying a duality result [8], 
[20]. The resulting expression for the maximum sum rate in 
the K user MAC is 



maxlog2 det ( I + ^ V HfcSfcHf 



s.t. Sfc t 

K 

^tr [Sfc] <P„ 

k=l 



k^l,...,K 



(6) 



where Sfc y indicates Sfc is a positive semi-definite 
transmit covariance matrix for mobile user k in the uplink. 
In this section, we approximate this sum rate in terms of each 
individual user's data rate. 



Consider the signal to interference plus noise ratio (SINR) 
for stream j belonging to user k under the multiuser virtual 
uplink model defined in Section Using (|4|i and finding the 
average received signal power (E [l^fejP]) and interference- 
plus-noise power corresponding to all other data streams and 
AWGN, this stream achieves an SINR of 
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"l^-HfcVfcjgfcjvgHfufcj 



(7) 



where 3kj J — Hk'Vkjqkj'vf^jilj^ is the virtual uplink 
interference-plus-noise receive covariance matrix. We approx- 
imate the maximum rate for this stream as 



Rk 



log2 (1 



(8) 



Under the central limit theorem, the interference-plus-noise 
becomes Gaussian as the number of interfering streams in- 
creases, making the approximation progressively better 

The goal of this work is to maximize the sum data rate 
subject to constraints on the total available power. Using 
the approximation in ([8]), we formally state the optimization 
problem as; 



(V,q) 



1 



argmax^^log2 

k=i j=i 

S.t. ||vfej||2 1, fc = 

Qkj > 0, j = 

k=l j=l 



, Lk 



(9) 



where ||q|ji is the 1-norm or the sum of all entries in q. 

We can see from Q that the optimum linear receiver u^j 
does not depend on any other columns of U; furthermore, it 
is the solution to the generalized eigenproblem 



opt 

^kj 



(HfcVfc,gfe,vg.Hf,Jfc,), (10) 



where emax(A, B) is the unit norm eigenvector x correspond- 
ing to the largest eigenvalue A in the generalized eigenproblem 
Ax = ABx. Within a normalizing factor, this solution is 
equivalent to the MMSE receiver: 



opt _ 

^kj - 



(11) 



When using linear decoding with this MMSE receiver, the 
MSE matrix for the virtual uplink is 



E 



E (i - x) (i - x) 

II VqVh^j irvv/q, 



(12) 



which follows from ( fTTT i and the system model assumptions 
stated in Section Thus, the mean squared error for user fc's 



J stream is 



(13) 



Now consider another optimization problem, minimizing the 
product of mean squared errors (PMSE) under a sum power 
constraint, 

K 

(V,q) = argminJI [|efcj 

'"^ fc=i j=i 
s-t. |ivfcj||2 = 1, k^\,...,K 
Qkj > 0, j = 1, . . . , Lfc 

K Lk 

k=l j=l 

Theorem 1: Under linear MMSE decoding at the base sta- 
tion, the optimization problems defined by (|9]l and (fl4l i are 
equivalent problems. 

Proof: Define the argument of the log term from dHJ as 
akj 1 + Ikj" ■ Using (|7]i, we can rewrite a^j as 



akj 



(15) 



It follows that by using the MMSE receiver from (fTTI) . 
1 . u^HfcVfejgfejvg.Hf Ufcj 



akj 



= 1 



= 1 - 



= l-9fc,v|5.Hf J-iHfcVfe, =efe,. (16) 

Thus, under hnear MMSE decoding, the MSE and SINR for 
stream j belonging to user k are related as 



1 



1 + 7.^"- 



(17) 



This relationship is similar to one shown for MMSE detection 
in CDMA systems [21]. By applying (fTTI ) to we see that 

E E iog2 (1 + 7.?) = - io& I n ft I • (18) 

fc=lj=l \k=lj=l I 

Since the constraints on v^j and q^j are identical in (|9]l and 
( fT4l i. the problem of maximizing sum rate in (|9|l is therefore 
equivalent to minimizing the PMSE in ( fT4l i. ■ 

IV. PMSE Minimization Algorithm 

With the motivation of Section |III] in mind, we now develop 
an algorithm to minimize the product of mean squared errors. 
The algorithm draws on previous work in minimizing the sum 
MSE [3], [4]. It operates by iteratively obtaining the downlink 
precoder matrix U and power allocations p and the virtual 
uplink precoder matrix V and power allocations q. Each step 
minimizes the objective function by modifying one of these 
four variables while leaving the remaining three fixed. 



A. Downlink Precoder 



TABLE I 

Iterative PMSE minimization algorithm 



For a fixed set of virtual uplink precoders V/j and power 
allocation q, the optimum virtual uplink decoder U is defined 
by ( fTTT i. Each ekj is minimized individually by this MMSE 
receiver, thereby also minimizing the product of MSEs. This 
U is normalized and used as the downlink precoder 

B. Downlink Power Allocation 

The MSE duality derived in [3], [4] states that all achievable 
MSEs in the uplink for a given U, V, and q (with sum power 
constraint ||q||i < Pmax), can also be achieved by a power 
allocation p in the downlink where ||p||i < -Pmax- 

In order to calculate the power allocation p, we apply the 
following result from [4]: 

p = ^^(D-i - (19) 

where ^ is the L x L cross coupling matrix defined as 



D — diae 



|hf u,f = |uf h,|2 

z=j 



(20) 



Iklk 



vfiHfunI 



(21) 

where H = HV = [hi, . . . , h^], U — [ui, . . . , u^], and 1 is 
the all-ones vector of the required dimension. 

C. Virtual Uplink Precoder 

Given a fixed U and p, the optimal decoders are the 
MMSE receivers: 



(22) 



In this equation, = H^UPU^H/i, + ct^I^v^ is the receive 
covariance matrix for user k. The optimum virtual uplink 
precoders are then the normalized columns of V^. 

D. Virtual Uplink Power Allocation 

The power allocation problem on the virtual uplink solves 
(O with a fixed matrix V. In the minimization of sum MSE, 
the corresponding step is a convex optimization problem [4]. 
Unfortunately, it is well accepted that the power allocation 
subproblem in PMSE minimization (or equivalently, in sum 
rate maximization) is non-convex [14], [16], [17]. 

We thus employ numerical techniques to solve the power 
allocation subproblem, and use sequential quadratic program- 
ming (SQP) [22] to minimize the PMSE. SQP solves suc- 
cessive approximations of a constrained optimization problem 
and is guaranteed to converge to the optimum value for convex 
problems; however, in the case of this non-convex optimization 
problem, SQP can only guarantee convergence to a local 
minimum. We note that a similar approach was proposed 
in [17], where iterations of the the sum rate maximization 
problem are solved by local approximations of the non-convex 
sum-rate function as a (convex) geometric program [23]. 

In summary, the PMSE minimization algorithm, motivated 
by a need to maximize sum data rate, follows the same steps 



Iteration: 

1- Downlink Precoder 



llUfcj Il2 



2- Downlink Power Allocation via MSE duality 



3- Virtual Uplink Precoder 

4- Virtual Uplink Power Allocation 

q = argminq Of^i 11^=1 '^kj, s.t. quj > 0, ||q!|i < Pma 

5- Repeat 1^ until [PMSEom - PMSEnew] /PMSEom < e 



as the minimization of the SMSE. The iterative algorithm 
keeps three of four parameters (U, p, V, q) fixed at each step 
and obtains the optimal value of the fourth. Convergence of 
the overall algorithm to a local minimum is guaranteed since 
the PMSE objective function is non-increasing at each of the 
four parameter update steps. Termination of the algorithm is 
determined by the selection of the convergence threshold e. 

While neither the overall problem ( fT4] i nor the power 
allocation subproblem are believed to be convex, simulations 
suggest that changing the initialization point has a minimal 
impact on the final solution; however, initialization with the 
U and p found using the SMSE algorithm in [4] appears to 
reduce the number of iterations required for convergence. A 
summary of our proposed algorithm can be found in Table HI 

V. Numerical Examples 

In this section, we present simulation results to illustrate the 
performance of the proposed algorithms. In all cases, the fad- 
ing channel is modelled as flat and Rayleigh using a channel 
matrix H composed of independent and identically distributed 
samples of a complex Gaussian process with zero mean and 
unit variance. The examples use a maximum transmit power 
of Pmax — 1; SNR is controlled by varying the receiver 
noise power . The transmitter is assumed to have perfect 
knowledge of the channel matrix H. 

A. Theoretical Performance 

First, we examine the information theoretical performance 
of the PMSE algorithm proposed in Section |IV] That is, we 
consider the spectral efficiency (measured in bps/Hz) that 
could be achieved under ideal transmission by drawing trans- 
mit symbols from a Gaussian codebook. Figure |2] illustrates 
how the proposed scheme performs when compared to the 
sum capacity for the broadcast channel (i.e. using dirty paper 
coding (DPC) [11]) and to traditional linear precoding methods 
based on channel orthogonalization (i.e. block diagonalization 
(BD) and zero forcing (ZF) [13]). This simulation models 
a i^T = 2 user system with M = 4 transmit antennas and 
Nk = 2 or Nk = 4 receive antennas per user The plot is 
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Fig. 2. PMSE vs. DPC and orthogonalization-based methods 

generated using 30000 channel realizations, with 5000 data 
symbols per channel realization, and the convergence threshold 
for the PMSE algorithm is set as e = 10"^. 

In Fig. |2] we see a slight divergence in the performance of 
the PMSE algorithm from the theoretical DPC bound at higher 
SNR. This drop in spectral efficiency may be caused by the 
non-convexity of the optimization problem, or it may suggest 
a fundamental gap between the optimal DPC bound and the 
achievable sum capacity under linear precoding. Nonetheless, 
the PMSE algorithm still maintains a higher spectral efficiency 
than the orthogonalization based schemes for Nk ~ 2. Further- 
more, the gap between the DPC bound and the PMSE precoder 
is only 0.6 dB for Nk — 4, where BD and ZF schemes can 
not be applied due to constraints on the number of antennas. 

B. Performance Using Practical Modulation 

The precoder and decoder design algorithm in Section |IV] 
is derived independently of modulation depth, based on the 
assumption that transmitted symbols originate from a unit- 
energy PSK constellation. In this section, we consider two 
approaches in selecting the modulation scheme to maximize 
data rate. 

The naive approach selects the largest PSK constellation 
of bkj bits per stream that satisfies a maximum bit error rate 
(BER) requirement of jSkj- The satisfaction of this constraint 
is determined using a closed form BER approximation [24], 

BERpsk(7) - ci exp ' ^^3) 

We apply the least aggressive of the bounds proposed in [24] 
by using the values ci = 0.25,C2 — 8,03 = 1.94, and C4 = 0. 
We note that this approximation only holds for 6 > 2; as such, 
the following exact expression should be used for BPSK: 

BERbpsk(7) ~ ^erfc {^) . (24) 



The BPSK expression can be used as a test of feasibility for 
the specified BER target; if the resulting BER under BPSK 
modulation is higher than Pkj, then we have two options: 
either declare the BER target infeasible, or transmit using the 
lowest modulation depth available (i.e. BPSK). In this work, 
we have elected to transmit using BPSK whenever the PMSE 
stage has allocated power to the data stream. Future work 
may consider either partial or complete non-transmission to 
implement power saving while strictly achieving the desired 
BER target. 

The naive approach is quite conservative in that there 
may be a large gap between the BER requirement and BER 
achieved for each channel realization. We suggest a proba- 
bilistic bit allocation scheme that switches between b^j bits 
(as determined by the naive approach) and hkj + 1 bits with 
probability pk, = [Pkj - BER^.J / [BERb,^.+i - BER^.J. 
This modulation strategy may not be appropriate for systems 
requiring instantaneous satisfaction of BER constraints; how- 
ever, the probabilistic method will still achieve the desired 
BER in the long-term average over channel realizations. 

Figure |3] shows the sum rate achieved in the same system 
configuration as described above {K = 2, M = 4, Nk ~ 2) 
with the additional required specification of Lk = 2 data 
streams per user and a target bit error rate of (3kj — 10^^. The 
plot illustrates the average number of bits per transmission for 
user 1 ; due to symmetry, the corresponding plot for user 2 is 
identical. Note that in contrast to Fig. |2] (which shows the sum 
capacity under ideal Gaussian coding), the sum rate in Fig. |3] 
is the average number of bits transmitted in each realization 
using symbols from a PSK constellation. 

In Fig. [3] we also consider using the naive PSK modulation 
scheme for the PMSE precoder and the SMSE precoder 
designed in [4]. Examination of this plot reveals that using the 
PMSE criterion is justified at practical SNR values with im- 
provements of approximately one bit per transmission near 15 
dB. Furthermore, using the probabilistic modulation scheme 
(designated "PMSE-P") yields an additional improvement of 
more than half a bit per transmission across all SNR values. 

In Fig. m we plot average BER versus SNR for the same 
system configuration as in Fig. [3] This plot illustrates how the 
naive bit allocation algorithm attempts to achieve the target 
BER of 10^^ for all data streams under PMSE, but also 
overshoots the target, converging to a BER of approximately 
5 X 10^**. This can be attributed to the looseness of the BER 
bound, as discussed above. In contrast, the probabilistic rate 
allocation algorithm not only increases the rate, as shown 
in Fig. [3] but also converges to a BER that is much closer 
to the desired target BER. The remaining gap between the 
actual BER achieved and the target BER can be attributed to 
looseness in the approximations of ( |23] | and ( |24] |. 

VI. Conclusions 

In this paper, we have considered the problem of designing 
an iterative method for maximizing bit rates in the multiuser 
MIMO downlink. Previous work in the multiuser downlink 
has focused largely on added reliability (minimizing SMSE), 
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Fig. 3. Sum rate vs. SNR for user 1 
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Fig. 4. BER vs. SNR for user 1 



and not on maximizing the data rate. We have designed a 
solution for a general MIMO system, where the number of 
users, base station antennas, mobile antennas, and streams 
transmitted are only constrained by resolvability of the data 
symbols. Our proposed solution uses the SINR duality results 
from previous work in minimizing SMSE. The product of 
the MSEs for all streams is minimized under a sum power 
constraint; this is achieved by employing a known upUnk- 
downlink duality of MSEs. We also presented an adaptive 
modulation scheme to realize these gains in rate in a practical 
system. The resulting SINR on each data stream is then used 
to select an appropriate PSK constellation. Simulations verify 
that significantly increased data rates can be achieved while 
meeting given BER constraints. 
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